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(54) Speech synthesizing apparatus and method, and storage medium therefor 



(57) A speech synthesizing apparatus for synthesiz- 
ing a speech waveform stores speech data, which is ob- 
tained by adding attribute information onto phoneme da- 
ta, in a database. In accordance with prescribed retriev- 
al conditions, a phoneme retrieval unit retrieves pho- 
neme data from the speech data that has been stored 
in the database and retains the retrieved results in a re- 
trieved-result storage area. A processing unit for assign- 



ing a power penalty and a processing unit for assigning 
a phoneme-duration penalty assign the penalties, on the 
basis of power and phoneme duration constituting the 
attribute information, to a set of phoneme data stored in 
the retrieved-result storage area. A processing unit for 
determining typical phoneme data performs sorting on 
the basis of the assigned penalties and, based upon the 
stored results, selects phoneme data to be employed in 
the synthesis of a speech waveform. 
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Description 

BACKGROUND OF THE INVENTION 

[0001] This invention relates to an speech synthesiz- s 
ing apparatus having a database for managing pho- 
neme data, in which the apparatus performs speech 
synthesis using the phoneme data managed by the da- 
tabase. The invention further relates to a method of syn- 
thesizing speech using this apparatus, and to a storage 10 
medium storing a program for implementing this meth- 
od. 

[0002] A method of speech synthesis which concate- 
nates waveform (which will be referred to as the "Con- 
catenative synthesis method 0 below) is available in the is 
prior art as a method of synthesizing speech. The Con- 
catenative synthesis method changes prosody with a 
Pitch synchronous overlap adding method (P-SOLA) 
which changes prosody by placing pitch waveform units 
extracted from the original waveform unit in conformity 20 
with a desired pitch timing. An advantage of the Con- 
catenative synthesis method is that the synthesized 
speech obtained is more natural than that provided by 
a synthesis method based upon parameters. A disad- 
vantage is that the allowable range for the change in 25 
prosody is narrow. 

[0003] Accordingly, sound quality is improved by pre- 
paring speech data of a wide variety of variations, se- 
lecting these properly and using them. Information such 
as the phoneme environment (the phoneme that is the 30 
object of synthesis or several phonemes including both 
sides thereof) and the fundamental frequency F 0 is used 
as the criteria for selecting the synthesis unit 
[0004] However, the conventional method of synthe- 
sizing speech described above involves a number of 35 
problems. 

[0005] By way of example, if a database contains a 
plurality of items of phoneme data which satisfy a certain 
phoneme environment and the fundamental frequency 
F 0 , the phoneme unit used in synthesis is one phoneme 40 
unit (e.g., the phoneme unit that appears in the database 
first) selected randomly from these items of phoneme 
data. Since the database is a collection of speech ut- 
tered by human beings, all of the phoneme data is not 
necessarily stable (i.e., not necessarily of good quality). 45 
The database may contain phoneme data that is the re- 
sult of mumbling, a halting voice, slowness of speech or 
hoarseness. If one item of phoneme data is selected 
randomly from such a collection of data, naturally there 
is the possibility that sound quality will decline when syn- so 
thesized speech is generated. 

SUMMARY OF THE INVENTION 

[0006] Accordingly, an object of the present invention ss 
is to provide a speech synthesizing apparatus and meth- 
od capable of appropriately selecting phoneme data 
used in speech synthesis and of suppressing any de- 



cline in sound quality in speech synthesis, as well as a 
storage medium storing a program for implementing this 
method. 

[0007] According to one aspect of the present inven- 
tion, the foregoing object is attained by providing a 
speech synthesizing apparatus comprising: storage 
means for storing plural items of phoneme data; retrieval 
means for retrieving phoneme data, in accordance with 
given retrieval conditions, from the plural items of pho- 
neme data stored in the storage means; penalty assign- 
ing means for assigning a penalty that is based upon an 
attribute value to each item of phoneme data retrieved 
by the retrieval means; and selection means for select- 
ing, from the phoneme data retrieved by the retrieval 
means, and based upon the penalty assigned by the 
penalty assigning means, phoneme data to be em- 
ployed in synthesis of a speech waveform. 
[0008] According to another aspect of the present in- 
vention, the foregoing object is attained by providing a 
speech synthesizing method comprising: a storage step 
of storing plural items of phoneme data; a retrieval step 
of retrieving phoneme data, in accordance with given 
search retrieval conditions, from the plural items of pho- 
neme data stored at the storage step; a penalty assign- 
ing step of assigning a penalty that is based upon an 
attribute value to each item of phoneme data retrieved 
at the retrieval step; and a selection step of selecting, 
from the phoneme data retrieved at the retrieval step, 
and based upon the penalty assigned at the penalty as- 
signing step, phoneme data employed in synthesis of a 
speech waveform. 

[0009] The present invention further provides a stor- 
age medium storing a control program for causing a 
computer to implement the method of synthesizing 
speech described above. 

[001 0] Other features and advantages of the present 
invention will be apparent from the following description 
taken in conjunction with the accompanying drawings, 
in which like reference characters designate the same 
or similar parts throughout the figures thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0011] The accompanying drawings, which are incor- 
porated in and constitute a part of the specification, 
illustrate embodiments of the invention and, together 
with the description, serve to explain the principles of 
the invention. 

Fig. 1 is a block diagram showing the construction 
of a speech synthesizing apparatus according to a 
first embodiment of the present invention; 
Fig. 2 is a block diagram illustrating functions relat- 
ing to phoneme data selection processing accord- 
ing to the first embodiment; 

Fig. 3 is a flowchart illustrating a procedure relating 
to phoneme data selection processing according to 
the first embodiment; 
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Fig. 4 is a block diagram illustrating functions relat- 
ing to phoneme data selection processing accord- 
ing to the second embodiment; 
Fig. 5 is a flowchart illustrating a procedure relating 
to phoneme data selection processing according to s 
the second embodiment; and 
Fig. 6 is a flowchart useful in describing an overview 
of speech synthesizing processing. 

DESCRIPTION OF THE PREFERRED 10 
EMBODIMENTS 

[0012] Preferred embodiments of the present inven- 
tion will now be described in detail in accordance with 
the accompanying drawings. 15 

[First Embodiment] 

[0013] Fig. 1 is a block diagram illustrating the con- 
struction of a speech synthesizing apparatus according 20 
to a first embodiment of the present invention. 
[0014] As shown in Fig. 1, the apparatus includes a 
control memory (ROM) 101 which stores a control pro- 
gram for causing a computer to implement control in ac- 
cordance with a control procedure shown in Fig. 3, a 25 
central processing unit 102 for executing processing 
such as decisions and calculations in accordance with 
the control procedure retained in the control memory 
101 , and a memory (RAM) 103 which provides a work 
area for when the central processing unit 1 02 executes 30 
various control operations. Allocated to the memory 1 03 
are an area 202 for holding the results of phoneme re- 
trieval, an area 204 for holding the results of penalty as- 
signment, an area 207 for holding the results of sorting, 
and an area 209 for holding representative phoneme da- 35 
ta. These areas will be described later with reference to 
Fig. 2. The apparatus further includes a disk device 104 
which, in this embodiment, is a hard disk. The disk de- 
vice 104 stores a database 200 described later with ref- 
erence to Fig. 2. The data of database 200 is stored in 40 
memory 1 03 when the data is used. A bus 1 05 connects 
the components mentioned above. 
[001 5] The speech synthesizing apparatus of this em- 
bodiment uses information such as the phoneme envi- 
ronment and fundamental frequency to select the ap- 45 
propriate phoneme data from speech data that has been 
recorded in the database 200 (Fig. 2) and performs 
waveform editing synthesis employing the selected da- 
ta. 

[0016] Fig. 6 is a flowchart illustrating an overview of so 
speech synthesizing processing according to this em- 
bodiment. The phoneme environment and fundamental 
frequency of a phoneme to be used are specified at step 
S11 in Fig. 6. This may be carried out by storing the pho- 
neme environment and fundamental frequency in the 55 
disk device 104 as a parameter file or by entering them 
via a keyboard. Next, at step S1 2, phoneme data to be 
used is selected from the database 200. This is followed 



by step S1 3, at which it is determined whether phoneme 
data to be selected exists. Control returns to step S11 if 
such data exists. If it is determined that all necessary 
phoneme data has been selected, on the other hand, 
control proceeds from step S1 3 to step S1 4 and speech 
synthesis by waveform editing is executed using the se- 
lected phoneme data. 

[0017] The details of processing for selecting the pho- 
neme data at step S1 2 will now be described. In the case 
described below, selection of phoneme data is carried 
out using the phoneme environment (three phonemes 
composed of the phoneme of interest and one phoneme 
on each side thereof, these being referred to as a so- 
called "triphone 0 ) and the average fundamental fre- 
quency of the phoneme as criteria for selecting pho- 
neme data. 

[0018] Fig. 2 is a block diagram illustrating functions 
relating to phoneme data selection processing for se- 
lecting the optimum phoneme data from a set of pho- 
neme data in which the phoneme environments and fun- 
damental frequencies are identical. The functions are 
those of a speech synthesizing apparatus according to 
the first embodiment. 

[0019] The database 200 in Fig. 2 stores speech data 
in which a phoneme environment, phoneme boundary 
and fundamental frequency, power and phoneme dura- 
tion are have been assigned to each item of phoneme 
data. A phoneme retrieval unit 201 retrieves phoneme 
data, which satisfies a specific phoneme environment 
and fundamental frequency, from the database 200. The 
area 202 stores a set of phoneme data, namely the re- 
sults of retrieval performed by the phoneme retrieval unit 

201. A power-penalty assignment processing unit 203 
assigns a penalty related to power to each item of pho- 
neme data of the set of phoneme data stored in the area 

202. The area 204 holds the results of the assignment 
of penalties to the phoneme data. A duration-penalty as- 
signment processing unit 205 assigns a penalty relating 
to phoneme duration to each items of phoneme data. 
[0020] A sorting processing unit 206 subjects the set 
of phoneme data to sorting processing regarding spe- 
cific information (power or phoneme duration, etc.) 
when a penalty is assigned. The area 207 holds the re- 
sults of sorting. In regard to the results obtained by as- 
signing penalties, a data determination processing unit 
208 selects phoneme data having the smallest penalty 
as representative phoneme data. The area 209 holds 
the representative phoneme data that has been decid- 
ed. 

[0021] From the speech synthesizing processing set 
forth above, processing for selecting phoneme data im- 
plemented by the above-described functional arrange- 
ment will be discussed next. Fig. 3 is a flowchart illus- 
trating a procedure relating to phoneme data selection 
processing for selecting the optimum phoneme data 
from the set of phoneme data having identical phoneme 
environments and fundamental frequencies. 
[0022] First, at step S301 , all phoneme data that sat- 
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isfies the phoneme environment (trlphone) and funda- 
mental frequency F 0 that were specified at step S11 is 
extracted from the database 200 and is stored in area 
202. Next, at step S302, the power-penalty assignment 
processing unit 203 assigns power-related penalties to s 
the set of phoneme data that has been stored in area 
202. 

[0023] The guideline involving power-related penal- 
ties is to assign large penalties to phoneme data having 
power values that depart from an average value of pow- 10 
er because the goal is to select phoneme data having 
an average value of power within the set of phoneme 
data. The power-penalty assignment processing unit 
203 instructs the sorting processing unit 206 to sort the 
phoneme data set, which has been extracted from the is 
area 202 that holds the results of retrieval, based upon 
values of power. Power referred to here may be the pow- 
er of the phoneme data or the average power per unit 
of time. 

[0024] The sorting processing unit 206 responds by 20 
sorting the phoneme data set based upon power and 
storing the results in the area 207 that is for retaining 
the results of sorting. The power-penalty assignment 
processing unit 203 waits for sorting to end and then 
assigns a penalty to the sorted phoneme data that has 25 
been stored in area 207. A penalty is assigned in ac- 
cordance with the guideline mentioned above. For ex- 
ample, among items of phoneme data that have been 
sorted in order of decreasing power, a penalty (e.g., 2.0 
points) is added onto phoneme data whose power val- 30 
ues fall within the smaller one-third of values and onto 
phoneme data whose power values fall within the larger 
one-third of values. In other words, a penalty is assigned 
to phoneme data other than the middle one-third of pho- 
neme data. 35 
[0025] Next, at step S303, the duration-penalty as- 
signment processing unit 205 assigns a penalty relating 
to phoneme duration through a procedure similar to that 
of the power-penalty assignment processing unit 203. 
Specifically, the duration -penalty assignment process- 40 
ing unit 205 instructs the sorting processing unit 206 to 
perform sorting based upon phoneme duration and 
stores the results in area 207. On the basis of the sorted 
results, the duration -penalty assignment processing 
unit 205 adds a penalty (e.g., 2.0 points) onto phoneme 45 
data whose phoneme durations fall within the smaller 
one-third of durations and onto phoneme data whose 
phoneme durations fall within the larger one-third of du- 
rations. The results obtained by the assignment of the 
penalty are retained in area 204. Control then proceeds so 
to step S304. 

[0026] Step S304 calls for the data determination 
processing unit 208 to determine a representative pho- 
neme unit in terms of the phoneme environment and 
fundamental frequency currently of interest. Here the ss 
set of phoneme data assigned penalty based upon pow- 
er and phoneme duration, stored in area 204, are deliv- 
ered to the sorting processing unit 206 and the sorting 



processing unit 206 is instructed to sort the results by 
penalty value. The sorting processing unit 206 performs 
sorting on the basis of the two types of penalties relating 
to power and phoneme duration (e.g., using the sum of 
the two penalty values) and stores the sorted results in 
area 207. When sorting processing ends, the data de- 
termination processing unit 208 selects phoneme data 
having the smallest penalty and stores it in area 209 for 
the purpose of employing this data as representative 
phoneme data. If a plurality of phoneme units having the 
minimum penalty value appear, the data determination 
processing unit 208 selects the phoneme unit located at 
the head of the sorted results. This is equivalent to se- 
lecting one phoneme unit randomly from those having 
the smallest penalty. 

[0027] Thus, in accordance with the first embodiment, 
the optimum phoneme data is selected, based upon a 
penalty relating to power and a penalty relating to pho- 
neme duration, from a phoneme data set in which the 
phoneme environments and fundamental frequencies 
are identical. 

[Second Embodiment] 

[0028] The first embodiment has been described in 
regard to a case where the phoneme environment (the 
"triphone", namely the phoneme of interest and one 
phoneme on each side thereof) and the average funda- 
mental frequency F 0 of the phoneme are used as criteria 
for selecting phoneme data. However, in instances 
where the triphone of a combination not contained in the 
database is required, the need arises to use an alternate 
"left-phone'' (a phoneme environment comprising the 
phoneme of interest and the phoneme to its left), "right- 
phone" (a phoneme environment comprising the pho- 
neme of interest and the phoneme to its right) or "phone" 
(the phoneme of interest alone). In the second embod- 
iment, therefore, there will be described a case where 
selection of phoneme data other than a specified tri- 
phone (such selected phoneme data will be referred to 
as a "triphone substitute") is taken into account. 
[0029] Fig. 4 is a block diagram illustrating functions 
relating to phoneme data selection processing for se- 
lecting the optimum phoneme data from a set of pho- 
neme data in which the phoneme environments and fun- 
damental frequencies are identical. The functions are 
those of a speech synthesizing apparatus according to 
the second embodiment. This embodiment differs from 
the first embodiment in Fig. 2 in that the apparatus fur- 
ther includes a processing unit for assigning element- 
number penalty. Other areas or units 400 to 409 corre- 
spond to the areas or units 200 to 209, respectively, of 
Fig. 2. The processing unit 410 assigns a penalty in de- 
pendence upon the number of elements in a set of pho- 
neme data. 

[0030] The speech synthesizing processing includes 
a procedure relating to phoneme data selection 
processing, which is implemented by the above-de- 
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scribed functional blocks, for selecting optimum pho- 
neme data from a set of phoneme data having identical 
phoneme environments and fundamental frequencies. 
This procedure will now be described. Fig. 5 is a flow- 
chart illustrating a procedure according to the second 
embodiment relating to phoneme data selection 
processing for selecting the optimum phoneme data 
from the set of phoneme data having identical phoneme 
environments and fundamental frequencies. 
[0031] Steps S501 to S503 are similar to steps S301 
to S303 (Fig. 3) in the first embodiment. It should be 
noted that if a specified triphone does not exist in the 
database, the triphone retrieval at step S501 involves 
the retrieval of the alternate candidates left-phone, right- 
phone or phone (the aforesaid "triphone substitute"). In 
this case, for example, firstly, retrieval of left-phone is 
carried out. If the left-phone does not exist in the data- 
base, then retrieval of right-phone is carried out. If the 
right-phone does not exist, then retrieval of phone is car- 
ried out. Alternatively, the sequence of retrieval may be 
different between vowel and consonant. For example, 
as for vowel, the retrieval is carried out in the sequence 
of left-phone, right-phone and phone. As for consonant, 
the retrieval is carried out in the sequence of right- 
phone, left-phone and phone. 

[0032] In the second embodiment, use of a triphone 
substitute means that a specified triphone does not ex- 
ist. As long as a specified triphone is contained in the 
database, however, this triphone is adopted. At step 

5504, therefore, it is determined whether a triphone sub- 
stitute has been obtained as the result of retrieval. If a 
triphone substitute has not been obtained, i.e., if the 
specified triphone has been obtained, control skips step 
S505 and proceeds to step S506. When the specified 
triphone is retrieved, therefore, processing similar to 
that of the first embodiment is executed. If it is deter- 
mined at step S504 that a triphone substitute has been 
retrieved, on the other hand, control proceeds to step 

5505. Here the processing unit 505 assigns a penalty 
in dependence upon the numbers of elements in the set 
of phoneme data. In a case where the specified triphone 
is absent, the processing unit 505 counts the numbers 
of elements contained in the phoneme data set, the 
count being performed per each triphone phoneme en- 
vironment group (a group classified by the environment 
comprising the phoneme concerned and one phoneme 
on each side thereof) of the alternate candidate left- 
phone (or right-phone or phone). In this embodiment, if 
the number of items of phoneme data of an applicable 
triphone phoneme environment is small (two or less), 
then the processing unit 505 adds a penalty (0.5 points) 
onto ail of the phoneme data concerned, in other words, 
the processing unit 505 judges that data having only a 
low frequency of appearance in a sufficiently large da- 
tabase is not reliable. 

[0033] For example, consider a case where a triphone 
t.A.kdoes not exist in the database and is to be replaced 
by a left-phone t.A.*. If two triphones t.A.p and 20 tri- 



phones t. A.t exist in the database, allocating a triphone 
substitute, which is to replace the triphone t.Ak, from 
among triphones t.A.t of which 20 exist will provided a 
higher probability of obtaining phoneme data of good 
s quality. 

[0034] If a penalty based upon number of elements is 
thus assigned, the result is stored in area 504, which is 
for holding the results of penalty assignment, and then 
control proceeds to step S506. Step S506 involves 

10 processing equivalent to that of step S304 in the first 
embodiment. In the second embodiment, a penalty 
based upon number of elements is assigned in addition 
to the penalty based upon power and the penalty based 
upon phoneme duration. As a result, phoneme data is 

is selected upon taking all of these three penalties into 
consideration. In a case where a specific triphone is re- 
trieved and processing proceeds directly from step 
S504 to step S506, penalty based upon number of ele- 
ments is not taken into account. 

20 [0035] Thus, in accordance with the second embodi- 
ment, it is possible to select the proper phoneme data 
inclusive of triphones that can be alternates. 
[0036] In the embodiments set forth above, a case 
has been described in which penalty assignment 

25 processing is executed in order of power penalty and 
phoneme -duration penalty (and then element-number 
penalty in the second embodiment). However, this does 
not impose a limitation upon the present invention, for 
the processing may be executed in any order. Further, 

30 an arrangement may be adopted in which these penalty 
assignment processing operations are executed con- 
currently. 

[0037] Further, in each of the foregoing embodiments, 
2.0 points is adopted as the penalty value for the power 

3S and phoneme-duration penalties. However, this does 
not impose a limitation upon the present invention, for it 
is obvious that a suitable value may be set. In addition, 
equal penalties need not be applied as the penalties re- 
lating to both characteristics. 

40 [0038] In the second embodiment, a case in which 0.5 
is set as the value of the element-number penalty is de- 
scribed. However, this does not impose a limitation upon 
the present invention, for a suitable value may be set. 
[0039] Furthermore, in each of the foregoing embod- 

45 iments, a case is described in which a penalty is as- 
signed to the one-third of phoneme data starting from 
smaller values (or to the one-third of phoneme data 
starling from larger values) in regard to the sorted re- 
sults. However, this does not impose a limitation upon 

50 the present invention. For example, it is possible to 
change the method of penalty assignment depending 
upon the number of items of phoneme data or the prop- 
erties of the phoneme data contained in the database. 
In such case a penalty may be assigned to data for 

55 which the difference relative to an average value is 
greater than a threshold value. 

[0040] Further, in the foregoing embodiments, there 
is described a method of selecting representative pho- 
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neme data in which the target is a phoneme data set 
that satisfies a specific phoneme environment and fun- 
damental frequency. However, this does not impose a 
limitation upon the present invention. For example, it is 
possible to use a phoneme data set for which the matter s 
of interest is solely the phoneme environment and to 
adopt the fundamental frequency as a factor for assign- 
ing a penalty. 

[0041] Further, in each of the above embodiments, 
there is described a method of selecting a representa- 10 
tive phoneme unit on demand, wherein the target is a 
phoneme data set that satisfies a specific phoneme en- 
vironment and fundamental frequency. However, an ar- 
rangement may be adopted in which a phoneme lexicon 
obtained by applying the processing of the first embod- is 
iment in advance is created based upon all conceivable 
phoneme environments and fundamental frequencies. 
[0042] Further, in each of the foregoing embodiments, 
a case in which the sorting processing unit and the area 
for holding the sorted results are designed for general- 20 
purpose use. However, this does not impose a limitation 
upon the present invention. For example, an arrange- 
ment may be adopted in which there is provided a sort- 
ing processor exclusively for the processing unit that as- 
signs the power penalties and a sorting processor ex- 25 
clusively for the processing unit that assigns the pho- 
neme-duration penalties. 

[0043] In each of the foregoing embodiments, a case 
in which the areas for storing data are implemented by 
memory (RAM) is described. However, this does not im- 30 
pose a limitation upon the present invention because 
any storage media may be used. 
[0044] Further, in each of the foregoing embodiments, 
a case in which the components are constituted by the 
same computer is described. However, this does not im- 35 
pose a limitation upon the present invention because 
these components may be implemented by computers 
or processors distributed over a network. 
[0045] Further, in each of the foregoing embodiments, 
a case in which a program is stored in a control memory 40 
(ROM) is described. However, this does not impose a 
limitation upon the present invention because the pro- 
gram may be stored in any storage media. The same 
operations performed by the program may be carried 
out by circuitry. 45 
[0046] The present invention can be applied to a sys- 
tem constituted by a plurality of devices or to an appa- 
ratus comprising a single device (e.g., a copier or fac- 
simile machine, etc.). 

[0047] Furthermore, it goes without saying that the in- so 
vention is applicable also to a case where the object of 
the invention is attained by supplying a storage medium 
storing or a carrier signal carrying the program codes of 
the software for performing the functions of the forego- 
ing embodiment to a system or an apparatus, reading ss 
the program codes with a computer (e.g., a CPU or 
MPU) of the system or apparatus from the storage me- 
dium, and then executing the program codes. 



[0048] In this case, the program codes read from the 
storage medium implement the novel functions of the 
invention, and the storage medium storing the program 
codes constitutes the invention. 

[0049] Further, the storage medium, such as a floppy 
disk, hard disk, optical disk, magneto-optical disk, CD- 
ROM, CD-R, magnetic tape, non-volatile type memory 
card or ROM can be used to provide the program codes. 
[0050] Furthermore, besides the case where the 
aforesaid functions according to the embodiment are 
implemented by executing the program codes read by 
a computer, it goes without saying that the present in- 
vention covers a case where an operating system or the 
like running on the computer performs a part of or the 
entire process in accordance with the designation of 
program codes and implements the functions according 
to the embodiments. 

[0051] It goes without saying that the present inven- 
tion further covers a case where, after the program 
codes read from the storage medium are written in a 
function expansion board inserted into the computer or 
in a memory provided in a function expansion unit con- 
nected to the computer, a CPU or the like contained in 
the function expansion board or function expansion unit 
performs a part of or the entire process in accordance 
with the designation of program codes and implements 
the function of the above embodiment. 
[0052] Thus, in accordance with the present inven- 
tion, as described above, it is possible to provide a 
speech synthesizing apparatus capable of selecting 
better phoneme units, as a result of which synthesized 
speech of superior quality can be produced. The inven- 
tion provides also a method of controlling this apparatus 
and a storage unit storing a program for implementing 
this control method. 

[0053] As many apparently widely different embodi- 
ments of the present invention can be made without de- 
parting from the spirit and scope thereof, it is to be un- 
derstood that the invention is not limited to the specific 
embodiments described above. 



Claims 

1. A speech synthesizing apparatus comprising: 

storage means for storing plural items of pho- 
neme data; 

retrieval means for retrieving phoneme data, in 
accordance with given retrieval conditions, 
from the plural items of phoneme data stored 
in said storage means; 

first penalty assigning means for assigning a 
penalty that is based upon an attribute value to 
each item of phoneme data retrieved by said 
retrieval means; and 

selection means for selecting, from the pho- 
neme data retrieved by said retrieval means, 
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and based upon the penalty assigned by said 
first penalty assigning means, phoneme data to 
be employed in synthesis of a speech wave- 
form. 

2. The apparatus according to claim 1 , wherein said 
storage means stores respective items of attribute 
information together with the plural items of pho- 
neme data; and 

said first penalty assigning means obtains an 
attribute value from the attribute information stored 
in said storage means. 

3. The apparatus according to claim 2, wherein the at- 
tribute information includes phoneme environment, 
phoneme boundary, fundamental frequency, power 
and phoneme duration. 

4. The apparatus according to any preceding claim, 
wherein said retrieval means retrieves phoneme 
data that satisfies a specified phoneme environ- 
ment. 

5. The apparatus according to any preceding claim, 
wherein said retrieval means retrieves phoneme 
data that satisfies a specified phoneme environ- 
ment and fundamental frequency. 

6. The apparatus according to any preceding claim, 
wherein said first penalty assigning means sorts re- 
trieved phoneme data based upon a prescribed at- 
tribute value and assigns a penalty value on the ba- 
sis of order obtained by sorting. 

7. The apparatus according to any preceding claim, 
wherein said first penalty assigning means assigns 
a penalty using power and phoneme duration of 
each item of phoneme data as the attribute values. 

8. The apparatus according to claim 7, wherein said 
first penalty assigning means: 

sorts the items of phoneme data in order of de- 
creasing power and assigns a power-related 
penalty on the basis of the order obtained by 
sorting, in such a manner that a small penalty 
is assigned to phoneme data whose power is 
close to an average value; and 
sorts the items of phoneme data in order of de- 
creasing phoneme duration and assigns a pho- 
neme-duration-related penalty on the basis of 
the order obtained by sorting, in such a manner- 
that a small penalty is assigned to phoneme da- 
ta whose phoneme duration is close to an av- 
erage value. 

9. The apparatus according to any preceding claim, 
further comprising: 



alternate retrieval means for retrieving pho- 
neme data that satisfies some of the retrieval 
conditions in a case where phoneme data that 
conforms to the retrieval conditions in said re- 

s trieval means does not exist; 

counting means for grouping phoneme data, 
which has been retrieved by said alternate re- 
trieval means, on the basis of a phoneme envi- 
ronment, and counting the items of phoneme 

10 data on a per-group basis; and 

second penalty assigning means for assigning 
a penalty on the basis of a count obtained by 
said counting means to the phoneme data re- 
trieved by said alternate retrieval means, this 

is penalty being assigned in addition to the pen- 

alty assigned by said first penalty assigning 
means. 

10. The apparatus according to claim 9, wherein the re- 
20 trieval conditions include phoneme environment; 

and 

said alternate retrieval means retrieves pho- 
neme data which agrees with part of a phoneme en- 
vironment specified in the retrieval conditions. 

25 

11. The apparatus according to claim 10, wherein the 
phoneme environment specified in the retrieval 
conditions is a triphone composed of an applicable 
phoneme and phonemes on both sides thereof; and 

30 said alternate retrieval means retrieves pho- 

neme data for which the applicable phoneme and 
its left side phoneme agree with the retrieval condi- 
tions, or phoneme data for which the applicable 
phoneme and its right side phoneme agree with the 

35 retrieval conditions. 

12. A speech synthesizing method comprising: 

a storage step of storing plural items of pho- 

40 neme data; 

a retrieval step of retrieving phoneme data, in 
accordance with given search retrieval condi- 
tions, from the plural items of phoneme data 
stored at said storage step; 

45 a first penalty assigning step of assigning a 

penalty that is based upon an attrioute value to 
each item of phoneme data retrieved at said re- 
trieval step; and 

a selection step of selecting, from the phoneme 
so data retrieved at said retrieval step, and based 

upon the penalty assigned at said penalty as- 
signing step, phoneme data employed in syn- 
thesis of a speech waveform. 

55 13. The method according to claim 12, wherein said 
storage step stores respective items of attribute in- 
formation together with the plural items of phoneme 
data; and 
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said first penalty assigning step obtains an at- 
tribute value from the attribute information stored at 
said storage step. 

14. The method according to claim 1 3, wherein the at- s 
tribute information includes phoneme label, pho- 
neme boundary, fundamental frequency, power and 
phoneme duration. 

15. The method according to any of claims 12 to 14, 10 
wherein said retrieval step retrieves phoneme data 
that satisfies a specified phoneme environment. 

16. The method according to any of claims 12 to 15, 
wherein said retrieval step retrieves phoneme data is 
that satisfies a specified phoneme environment and 
fundamental frequency. 

17. The method according to any of claims 12 to 16, 
wherein said first penalty assigning step sorts re- 20 
trieved phoneme data based upon a prescribed at- 
tribute value and assigns a penalty value on the ba- 
sis of order obtained by sorting. 

1 8. The apparatus according to any of claims 1 2 to 17, 25 
wherein said first penalty assigning step assigns a 
penalty using power and phoneme duration of each 
item of phoneme data as the attribute values. 

19. The method according to claim 18, where in said first 30 
penalty assigning step: 



on a per-group basis; and 
a second penalty assigning step of assigning a 
penalty on the basis of a count obtained at said 
counting step to the phoneme data retrieved at 
said alternate retrieval step, this penalty being 
assigned in addition to the penalty assigned at 
said first penalty assigning step. 

21. The method according to claim 20, wherein the re- 
trieval conditions include phoneme environment; 
and 

said alternate retrieval step retrieves pho- 
neme data which agrees with part of a phoneme en- 
vironment specified in the retrieval conditions. 

22. The method according to claim 21 , wherein the pho- 
neme environment specified in the retrieval condi- 
tions is a triphone composed of an applicable pho- 
neme and phonemes on both sides thereof; and 

said alternate retrieval means retrieves pho- 
neme data for which the applicable phoneme and 
its left side phoneme agree with the retrieval condi- 
tions, or phoneme data for which the applicable 
phoneme and its right side phoneme agree with the 
retrieval conditions. 

23. A storage medium storing a control program for 
causing a computer to execute speech synthesis 
using phoneme data, said control program having: 

code of a storage step of storing plural items of 
phoneme data; 

code of a retrieval step of retrieving phoneme 
data, in accordance with given search retrieval 
conditions, from the plural items of phoneme 
data stored at said storage step; 
code of a first penalty assigning step of assign- 
ing a penalty that is based upon an attribute val- 
ue to each item of phoneme data retrieved at 
said retrieval step; and 

code of a selection step of selecting, from the 
phoneme data retrieved at said retrieval step, 
and based upon the penalty assigned at said 
first penalty assigning step, phoneme data em- 
ployed in synthesis of a speech waveform. 

24. The storage medium according to claim 23, wherein 
said control program further has: 

code of an alternate retrieval step of retrieving 
phoneme data that satisfies some of the retriev- 
al conditions in a case where phoneme data 
that conforms to the retrieval conditions at said 
retrieval step does not exist; 
code of a counting step of grouping phoneme 
data, which has been retrieved at said alternate 
retrieval step, on the basis of a phoneme envi- 
ronment, and counting the items of phoneme 



sorts the items of phoneme data in order of de- 
creasing power and assigns a power-related 
penalty on the basis of the order obtained by 35 
sorting, in such a manner that a small penalty 
is assigned to phoneme data whose power is 
close to an average value; and 
sorts the items of phoneme data in order of de- 
creasing phoneme duration and assigns a pho- 40 
neme-duration-related penalty on the basis of 
the order obtained by sorting, in such a manner 
that a small penalty is assigned to phoneme da- 
ta whose phoneme duration is close to an av- 
erage value. 45 

20. The method according to any of claims 12 to 19, 
further comprising: 

an alternate retrieval step of retrieving pho- so 
neme data that satisfies some of the retrieval 
conditions in a case where phoneme data that 
conforms to the retrieval conditions at said re- 
trieval step does not exist; 

a counting step of grouping phoneme data, ss 
which has been retrieved at said alternate re- 
trieval step, on the basis of a phoneme environ- 
ment, and counting the items of phoneme data 
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data on a per-group basis; and 
code of a second penalty assigning step ol as- 
signing a penalty on the basis of a count ob- 
tained at said counting step to the phoneme da- 
ta retrieved at said alternate retrieval step, this 5 
penalty being assigned in addition to the pen- 
alty assigned at said first penalty assigning 
step. 

25. A speech processing apparatus comprising: io 

storage means for storing data for a plurality of 
portions of speech; 

means for retrieving plural portions of speech 
data from said storage means in accordance is 
with predetermined retrieval conditions; 
means tor assigning a weighting to each of the 
retrieved portions of speech data based upon 
an attribute value associated with the respec- 
tive portions of speech data; and 20 
means for selecting one of said plural portions 
of speech data retrieved from said storage 
means based upon the weightings assigned to 
said plural portions of speech data. 

25 

26. A speech synthesizing apparatus comprising: 

means for storing plural portions of speech da- 
ta; 

means for retrieving plural portions of the 30 
speech data stored in said storage means, in 
accordance with predetermined retrieval condi- 
tions; 

first penalty assigning means for assigning a re- 
spective first penalty to each of said plural 35 
speech data portions retrieved from said stor- 
age means which is based upon a first attribute 
of the corresponding speech portion; 
second penalty assigning means for assigning 
a respective second penalty to each of said plu- *o 
ral portions of speech data retrieved from said 
storage means based upon a second attribute 
of the corresponding portion of speech data; 
means for combining the respective first and 
second penalties for each speech data portion 45 
to generate a respective combined penalty for 
each of the retrieved portions of speech data; 
selection means for selecting one of the por- 
tions of speech data from the plural portions of 
speech data retrieved by said retrieval means so 
based upon the combined penalties calculated 
by said combining means; and 
means for synthesizing an acoustic speech sig- 
nal from the selected portion of speech data. 

55 

27. Processor implementable instructions for control- 
ling a processor to implement the method of any 
one of claims 1 2 to 22. 
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